Iterative Integration of Unsupervised Features for Chinese Dependency Parsing
نویسندگان
چکیده
Since Chinese dependency parsing is lack of a large amount of manually annotated dependency treebank. Some unsupervised methods of using large-scale unannotated data are proposed and inevitably introduce too much noise from automatic annotation. In order to solve this problem, this paper proposes an approach of iteratively integrating unsupervised features for training Chinese dependency parsing model. Considering that more errors occurred in parsing longer sentences, this paper divide raw data according to sentence length and then iteratively train model. The model trained on shorter sentences will be used in the next iteration to analyze longer sentences. This paper adopts a character-based dependency model for joint word segmentation, POS tagging and dependency parsing in Chinese. The advantage of the joint model is that one task can be promoted by other tasks during processing by exploring the available internal results from the other tasks. The higher accuracy of the three tasks on shorter sentences can bring about higher accuracy of the whole model. This paper verified the proposed approach on the Penn Chinese Treebank and two raw corpora. The experimental results show that F1-scores of the three tasks were improved at each iteration, and F1score of the dependency parsing was increased by 0.33%, compared with the conventional method.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملتأثیر ساختواژهها در تجزیه وابستگی زبان فارسی
Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...
متن کاملCRF Autoencoder for Unsupervised Dependency Parsing
Unsupervised dependency parsing, which tries to discover linguistic dependency structures from unannotated data, is a very challenging task. Almost all previous work on this task focuses on learning generative models. In this paper, we develop an unsupervised dependency parsing model based on the CRF autoencoder. The encoder part of our model is discriminative and globally normalized which allo...
متن کاملUnsupervised Dependency Parsing using Reducibility and Fertility features
This paper describes a system for unsupervised dependency parsing based on Gibbs sampling algorithm. The novel approach introduces a fertility model and reducibility model, which assumes that dependent words can be removed from a sentence without violating its syntactic correctness.
متن کاملThe AI-KU System at the SPMRL 2013 Shared Task : Unsupervised Features for Dependency Parsing
We propose the use of the word categories and embeddings induced from raw text as auxiliary features in dependency parsing. To induce word features, we make use of contextual, morphologic and orthographic properties of the words. To exploit the contextual information, we make use of substitute words, the most likely substitutes for target words, generated by using a statistical language model. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016